Tied Mixtures in the Lincoln Robust CSR
نویسنده
چکیده
HMM recognizers using either a single Gaussian or a Gaussian mixture per state have been shown to work fairly well for 1000-word vocabulary continuous speech recognition. However, the large number of Gaussians required to cover the entire English language makes these systems unwieldy for large vocabulary tasks. Tied mixtures offer a more compact way of representing the observation pdf's. We have converted our independent mixture systems to tied mixtures and have obtained mixed results: a 13% improvement in speaker-dependent recognition without cross-word triphone models, but no improvement in our speaker-dependent system with cross-word boundary triphone models or in our speaker-independent system. There is also a reduction in CPU requirements during recognit ion--but this is counter-balanced by an increase during training. This paper also includes a comment on the validity of the DARPA program's evaluation test system comparisons. I N T R O D U C T I O N Single Gaussian per state speaker-dependent (SD) HMM recognizers and low-order Gaussian mixture per state speaker-independent (SI) HMM recognizers have been shown to work fairly well for 1000-word vocabulary, continuous speech recognition [10,11]. However, a SD system would require about 30,000 Gaussians to cover the word-internal triphones of English and a SI system would require at least 100,000. The strategy of one or more individual Gaussians per state is appropriate for small vocabulary systems, but becomes unwieldy for large vocabulary systems. Interpolation is often required to cluster models, smooth models, or to predict models which are not observed in t ra in ing--but there is no clean strategy for interpolating independent Gaussian mixtures--ei ther the mean(s) are changed or the mixture order increases each time another model is included into an interpolated model. Tied mixtures [3,2,4] offer a solution for these problems while retaining a basic continuous observation HMM system. (Gaussian tied mixtures are mixtures which share a common pool of Gaussians.) They are mixtures, and thus avoid the unimodal distribution limitation of single Gaussians. Unlike independent mixtures, they interpolate well by interpolating the weights of the corresponding Gaussians. And since the pool of Gaussians is of a given size, a mixture order cannot exceed this size. In effect, they form a middle ground between the histograms of discrete observation systems and non-tied-mixture systems. Tied mixtures can also be viewed as a discrete observation system modified to allow a simultaneous match to many templates with the degree of template match included. In contrast to the discrete observation system, there is no quantization error and the "templates"(Gaussians) can be jointly optimized with the rest of the HMM. 1This work was sponsored by the Defense Advanced Research Projects Agency.
منابع مشابه
Robust Continuous Speech Recognition Technology Program Summary
The major objective of this program is to develop and demonstrate robust, high-performance continuous speech recognition (CSR) techniques and systems focused on applications in spoken language systems (SLS). A key supporting objective is to develop techniques for integration of CSR and natural language processing (NLP) systems in SLS applications. The CSR techniques are based on a continuousobs...
متن کاملRobust Speech Recognition
The Lincoln Laboratory Program in Robust Speech Recognition Technology was initiated in FY85 with the major goal of developing techniques for high-performance speech recognition under the stress and noise conditions typical of the fighter cockpit. After achieving significant advances in robust isolated-word recognition (IWR) during FY85 and FY86, the program evolved in FY87 to the development o...
متن کاملNew Results with the Lincoln Tied-Mixture HMM CSR System
The following describes recent work on the Lincoln CSR system. Some new variations in semiphone modeling have been tested. A very simple improved duration model has reduced the error rate by about 10~ in both triphone and semiphone systems. A new training strategy has been tested which, by itself, did not provide useful improvements but suggests that improvements can be obtained by a related ra...
متن کاملThe Lincoln Continuous Tied-Mixture HMM Speech Recognizer
The Lincoln robust HMM recognizer has been converted from a single Ganssian or Gaussian mixture pdf per state to tied mixtures in which a single set of Gaussians is shared between all states. There were some initial difficulties caused by the use of mixture pruning [12] but these were cured by using observation pruning. Fixed weight smoothing of the mixture weights allowed the use of word-bound...
متن کاملThe Lincoln Large-Vocabulary HMM CSR
The work described here focuses on recognition of the Wall Street Journal (WSJ) pilot database [17], a new CSR database which supports 5K, 20K, and up to 64Kword CSR tasks. The original Lincoln Tied-Mixture HMM CSR was implemented using a time-synchronous beam-pruned search of a static network[14] and does not extend well to this task because the recognition network would be too large for curre...
متن کامل